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Abstract 

A hitting set for a collection of sets is a set that has a non-empty intersection with each 
set in the collection; the hitting set problem is to find a hitting set of minimum cardinality. 
Motivated by instances of the hitting set problem where the number of sets to be hit is large, 
we introduce the notion of implicit hitting set problems. In an implicit hitting set problem the 
collection of sets to be hit is typically too large to list explicitly; instead, an oracle is provided 
which, given a set H, either determines that H is a hitting set or returns a set that H does 
not hit. We show a number of examples of classic implicit hitting set problems, and give a 
generic algorithm for solving such problems optimally. The main contribution of this paper is 
to show that this framework is valuable in developing approximation algorithms. We illustrate 
this methodology by presenting a simple on-line algorithm for the minimum feedback vertex 
set problem on random graphs. In particular our algorithm gives a feedback vertex set of size 
n — (l/p) lognp(l — o(l)) with probability at least 3/4 for the random graph G n>p (the smallest 
feedback vertex set is of size n — (2/p)\ognp(l + o(l))). We also consider a planted model 
for the feedback vertex set in directed random graphs. Here we show that a hitting set for a 
polynomial-sized subset of cycles is a hitting set for the planted random graph and this allows 
us to exactly recover the planted feedback vertex set. 



1 Introduction 

In the classic Hitting Set problem, we are given a universe U of elements and a collection T 
of subsets S%, ... , S m of U; the objective is to find a subset H C U of minimum cardinality so 
that every subset Si in T contains at least one element from H. The problem is NP-hard [Kar72j, 
approximable to within log 2 \ U\ using a greedy algorithm, and has been studied for many interesting 
special cases. 

There are instances of the hitting set problem where the number of subsets \T\ to hit is ex- 
ponential in the size of the universe. Consequently, obtaining a hitting set with approximation 
factor log 2 \ U\ using the greedy algorithm which examines all subsets is unreasonable for practical 
applications. Our motivation is the possibility of algorithms that run in time polynomial in the 
size of the universe. In this paper, we introduce a framework that could be useful in developing 
efficient approximation algorithms for instances of the hitting set problem with exponentially many 
subsets to hit. 

We observe that in many combinatorial problems, T has a succinct representation that allows 
efficient verification of whether a candidate set hits every subset in T ■ Formally, in an implicit 
hitting set problem, the input is a universe U and a polynomial-time oracle that, given a set H, 
either determines that H is a hitting set or returns a subset that is not hit by H. Thus, the collection 
T of subsets to hit is not specified explicitly. The objective is to find a small hitting set by making 
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at most polynomial^ t/|) queries to the oracle. In Section 1.1, we show several well-known problems 
that can be formulated as implicit hitting set problems. 

We present a generic algorithm to obtain the optimal solution of implicit hitting set problems 
in Section 2. As this algorithm solves optimally the NP-hard (classic) hitting set problem as a 
subroutine, its worst-case running time is exponential as a function of \U\. The main purpose of 
stating the generic algorithm is to develop an intuition towards using the oracle. It suggests a 
natural way to use the oracle: first (1) propose a candidate hitting set H, then (2) use the oracle 
to check if the candidate set hits all the subsets, and if not obtain a subset S that has not been 
hit, and finally (3) refine H based on S and repeat until a hitting set is found. 

The generic algorithm for the implicit hitting set problem is in fact a generalization of online 
algorithms for hitting set problems. Here, the ground set is specified in advance as before and 
the subsets to be hit arrive online. On obtaining a subset, the algorithm has to decide which new 
element to include in the hitting set and commit to the element. Thus, the online algorithm is 
restricted in that the refinement procedure can only add elements. Moreover, only those subsets 
that have not been hit by the candidate set are revealed online thereby saving the algorithm from 
having to examine all subsets in T ■ This is similar to the mistake bound learning model [Lit88j. 

We apply the implicit hitting set framework and specialize the generic algorithm to the Minimum 
Feedback Vertex Set (FVS) problem: given a graph G(V, E), find a subset S C V of smallest 
cardinality so that every cycle in the graph contains at least one vertex from S. Although the 
number of cycles could be exponential in the size of the graph, one can efficiently check whether 
a proposed set H hits all cycles {i.e., is a feedback vertex set) or find a cycle that is not hit by 
H using a breadth-first search procedure after removing the subset of vertices H from the graph. 
The existence of a polynomial time oracle shows that it is an instance of the implicit hitting set 
problem. 

The main focus of this paper is to develop algorithms that find nearly optimal hitting sets 
in random graphs or graphs with planted feedback vertex sets, by examining only a polynomial 
number of cycles. For this to be possible, we need the oracle to pick cycles that have not yet been 
hit in a natural yet helpful manner. If the oracle is adversarial, this could force the algorithm to 
examine almost all cycles. We consider two natural oracles: one that picks cycles in breath-first 
search (BFS) order and another that picks cycles according to their size. 

We prove that if cycles in the random graph G n>p are obtained in a breadth-first search ordering, 
there is an efficient algorithm that examines a polynomial collection T' of cycles to build a nearly 
optimal feedback vertex set for the graph. The algorithm builds a solution iteratively by (1) 
proposing a candidate for a feedback vertex set in each iteration, (2) finding the next cycle that 
is not hit in a breadth-first ordering of all cycles, (3) augmenting the proposed set and repeating. 
A similar result for directed random graphs using the same algorithm follows by ignoring the 
orientation of the edges. Our algorithm is an online algorithm i.e., it commits to only adding and 
not deleting vertices from the candidate feedback vertex set. 

It is evident from our results that the size of the feedback vertex set in both directed and 
undirected random graphs is close to n, for sufficiently large p. This motivates us to ask if a smaller 
planted feedback vertex set in random graphs can be recovered by using the implicit hitting set 
framework. This question is similar in flavor to the well-studied planted clique problem [Jer92| 
AKS98, FK08], but posed in the framework of implicit hitting set problems. We consider a natural 
planted model for the feedback vertex set problem in directed graphs. In this model, a subset of 
bn vertices, for some constant < 5 < 1, is chosen to be the feedback vertex set. The subgraph 
induced on the complement is a random directed acyclic graph (DAG) and all the other arcs are 
chosen with probability p independently. The objective is to recover the planted feedback vertex 
set. We prove that the optimal hitting set for cycles of bounded size is the planted feedback vertex 
set. Consequently, ordering the cycles according to their sizes and finding an approximately optimal 
hitting set for the small cycles is sufficient to recover the planted feedback vertex set. This also 
leads to an online algorithm when cycles are revealed in increasing order of their size with ties 
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broken arbitrarily. 

We conclude this section with some well-known examples of implicit hitting set problems. 

1.1 Implicit Hitting Set Problems 

An implicit hitting set problem is one in which, for each instance, the set of subsets is not listed 
explicitly but instead is specified implicitly by an oracle: a polynomial-time algorithm which, given 
a set H C U, either certifies that H is a hitting set or returns a subset that is not hit by H. 
Each of the following is an implicit hitting set problem: 

• Feedback Vertex Set in a Graph or Digraph 

Ground Set: Set of vertices of graph or digraph G. 
Subsets: Vertex sets of simple cycles in G. 

• Feedback Edge Set in a Digraph 

Ground Set: Set of edges of digraph G. 
Subsets: Edge sets of simple cycles in G. 

• Max Cut 

Ground Set: Set of edges of graph G. 
Subsets: Edge sets of simple odd cycles in G. 

• k-Matroid Intersection 

Ground Set: Common ground set of k matroids. 
Subsets: Subsets in the k matroids. 

• Maximum Feasible Set of Linear Inequalities 

Ground Set: A finite set of linear inequalities. 

Subsets: Minimal infeasible subsets of the set of linear inequalities. 

• Maximum Feasible Set of Equations of the Form Xi — Xj = Cij ( mod q) 

This example is motivated by the Unique Games Conjecture. 

• Synchronization in an Acyclic Digraph 

Ground Set: A collection U of pairs of vertices drawn from the vertex set of an acyclic digraph 
G. 

Subsets: Minimal collection C of pairs from U with the property that, if each pair in C is 
contracted to a single vertex, then the resulting digraph contains a cycle. 

Organization. In Section 2, we present a generic algorithm for the optimal solution of implicit 
hitting set problems. Then, we focus on specializing this algorithm to obtain small feedback vertex 
sets in directed and undirected random graphs. We analyze the performance of this algorithm 
in Section 3. We then consider a planted model for the feedback vertex set problem in directed 
random graphs. In Section 4, we give an algorithm to recover the planted feedback vertex set by 
finding an approximate hitting set for a polynomial-sized subset of cycles. We prove a lower bound 
for the size of the feedback vertex set in random graphs in Section 5. We state our results more 
precisely in the next section. 

1.2 Results for Feedback Vertex Set Problems 

We consider the feedback vertex set problem for the random graph G ntP , a graph on n vertices 
in which each edge is chosen independently with probability p. Our main result here is that a 
simple augmenting approach based on ordering cycles according to a breadth- first search (Algorithm 
Augment-BFS described in the next section) has a strong performance guarantee. 
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Theorem 1. For G np , such that p = o(l), there exists a polynomial time algorithm that produces 
a feedback vertex set of size at most n — (1/p) log (np)(l — o(l)) with probability at least 3/4. 

Throughout, o(l) is with respect to n. We complement our upper bound with a lower bound 
on the feedback vertex set for G n ^ p obtained using simple union bound arguments. 

Theorem 2. Let r = |log(np)(l + o(l)) + 1. If p < 1/2, then every subgraph induced by any 
subset of r vertices in G n ^ p contains a cycle with high probability. 

This gives an upper bound of r — 1 on the maximum induced acyclic subgraph of G n , p . So, 
the size of the minimum feedback vertex set for G n)P is at least n — r + 1 = n — (2/p)lognp. 
A result of Fernandez de la Vega [FdlV96j shows that G niP has an induced tree of size at least 
(2/p) lognp(l — o(l)), when p = o(l). This gives the best possible existential result: there exists a 
feedback vertex set of size at most n — (2/p) lognp(l — o(l)) with high probability in G ntP , when 
p = o(l). We note that this result is not algorithmic; Fernandez de la Vega gives a greedy algorithm 
to obtain the largest induced tree of size (1/p) lognp(l — o(l)) in |FdlV86]. This algorithm is based 
on growing the induced forest from the highest labeled vertex and does not fall in the implicit 
hitting set framework (when the graph is revealed as a set of cycles). In contrast, our main 
contribution to the FVS problem in random graphs is showing that a simple breadth-first ordering 
of the cycles is sufficient to find a nearly optimal feedback vertex set. We also note that our 
algorithm is an online algorithm with good performance guarantee when the cycles are revealed 
according to a breadth-first ordering. Improving on the size of the FVS returned by our algorithm 
appears to require making progress on the long-standing open problem of finding an independent 
set of size ((1 + e)/p) lognp in G UjP . Assuming an optimal algorithm for this problem leads to an 
asymptotically optimal guarantee matching Fernandez de la Vega's existential bound. 

Next, we turn our attention to the directed random graph D n ^ p on n vertices. The directed 
random graph D n>p is obtained as follows: choose a set of undirected edges joining distinct elements 
of V independently with probability 2p. For each chosen undirected edge {u, v}, orient it in one of 
the two directions {u — > v, v — > u} in D n ^ p with equal probability. 

The undirected graph Gr> obtained by ignoring the orientation of the edges in D n<p is the 
random graph G(n,2p). Moreover, a feedback vertex set in Go is also a feedback vertex set for 
D ntP . Therefore, by ignoring the orientation of the arcs, the Augment-BFS algorithm as applied to 
undirected graphs can be used to obtain a feedback vertex set of size at most n — (l/2p) log (2np) 
with probability at least 3/4. A theorem of Spencer and Subramanian [SS08J gives a nearly matching 
lower bound on the size of the feedback vertex set in D ntP . 

Theorem 3. [SS08] Consider the random graph D UtP , where np > W , for some fixed constant W . 
Let r = (2/log(l — p)^ 1 ) (log (np) + 3e). Every subgraph induced by any subset of r vertices in G 
contains a cycle with high probability. 

It is evident from the results above that the feedback vertex set in a random graph contains 
most of its vertices when p = o(l). This motivates us to ask if a significantly smaller "planted" 
feedback vertex set in a random graph can be recovered with the implicit hitting set framework. 
In order to address this question, we present the following planted model. 

The planted directed random graph D nj s )P on n vertices for < 5 < 1 is obtained as follows: 
Choose 5n vertices arbitrarily to be the planted subset P. Each pair (u, v) where u G P, v € V, 
is adjacent independently with probability 2p and the corresponding edge is oriented in one of the 
two directions {u — > v, v — > u} in D n ^ >p with equal probability. The arcs between vertices in V \ P 
are obtained in the following manner to ensure that the subgraph induced on V \ P is a DAG: 
Pick an arbitrary permutation of the vertices in V\P. With the vertices ordered according to this 
permutation, each forward arc is present with probability p independently; no backward arcs occur 
according to this ordering. 
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Figure 1: Planted Model 

We prove that for graphs D n ^, p , for large enough p, it is sufficient to hit cycles of small size to 
recover the planted feedback vertex set. For example, if p > Co/n 1 / 3 for some absolute constant 
Co, then it is sufficient to find the best hitting set for triangles in D n ^ yP - This would be the planted 
feedback vertex set. We state the theorem for cycles of length k. 

Theorem 4. Let D be a planted directed random graph D n $ p with planted feedback vertex set P, 
where p > C/n 1 ^ 2 ^ for some constant C and < 5 < 9/19. Then, with high probability, the 
smallest hitting set for the set of cycles of size k in D is the planted feedback vertex set P. 

Thus, in order to recover the planted feedback vertex set, it is sufficient to obtain cycles in 
increasing order of their sizes and find the best hitting set for the subset of all cycles of size k. 
Moreover, the expected number of cycles of length k is at most (nkp) k = poly(n) for the mentioned 
range of p and constant k. Thus, we have a polynomial-sized collection T' of cycles, such that the 
optimal hitting set for T' is also the optimal hitting set for all cycles in D n ^ p - 

However, finding the smallest hitting set is NP-hard even for triangles. We give an efficient 
algorithm to recover the planted feedback vertex set using an approximate hitting set for the small 
cycles. 

Theorem 5. Let D be a planted directed random graph D n j, p with planted feedback vertex set P, 
where p > C/n 1 ' 2 ^ for some constant C and k > 3, < 5 < l/2k. Then, there exists an algorithm 
to recover the planted feedback vertex set P with high probability; this algorithm has an expected 
running time of (nkp)°( k \ 

2 Algorithms 

In this section, we mention a generic algorithm for implicit hitting set problems. We then focus on 
specializing this algorithm to the feedback vertex set problems in directed and undirected graphs. 

2.1 A Generic Algorithm 

We mention a generic algorithm for solving instances of the implicit hitting set problem optimally 
with the aid of an oracle and a subroutine for the exact solution of (explicit) hitting set problems. 
The guiding principle is to build up a short list of important subsets that dictate the solution, while 
limiting the number of times the subroutine is invoked, since its computational cost is high. 

A set H C U is called feasible if it is a hitting set for the implicit hitting set problem, and 
optimal if it is feasible and of minimum cardinality among all feasible hitting sets. Whenever the 
oracle reveals that a set H is not feasible, it returns c(H), a subset that H does not hit. Each 
generated subset c(H) is added to a growing list T of subsets. A set H is called T-feasible if it hits 



5 



every subset in T and T-optimal if it is T-feasible and of minimum cardinality among all T-feasible 
subsets. If a T-optimal set K is feasible then it is necessarily optimal since K is a valid hitting set 
for the implicit hitting set problem which contains subsets in V, and K is the minimum hitting set 
for subsets in V. Thus the goal of the algorithm is to construct a feasible T-optimal set. 
Generic Algorithm 
Initialize T 0. 

1. Repeat: 

(a) H<-U. 

(b) Repeat while there exists a T-feasible set H' = (H U X) — Y such that X,Y C U, 
\X\ < \Y\: 

i. If H' is feasible then H +- H'; else r «- T U {c(H')}. 

(c) Construct a r-optimal set K. 

(d) If \H\ = \K\ then return H and halt (H is optimal); if K is feasible then return K and 
halt (K is optimal); else r<-TU {c(K)}. 

Remark 1. Since the generic algorithm solves optimally an NP-hard problem as a subroutine, its 
worst-case execution time is exponential in \U\. Its effectiveness in practice depends on the choice 
of the missed subset that the oracle returns. 

A companion paper |KMCj describes successful computational experience with an algorithm 
that formulates a multi-genome alignment problem as an implicit hitting set problem, and solves 
it using a specially tailored variant of the generic algorithm. 



Algorithm Augment-BFS 

1. Start from an arbitrary vertex as a surviving vertex. Initialize i=l . 

2. Repeat: 

(a) Obtain cycles induced by one step BFS-exploration of the surviving 
vertices at depth i. Delete vertices at depth i+1 that are present in 
these cycles. Declare the remaining vertices at depth i+1 as surviving 
vertices . 

(b) If no vertices at depth i+1 are surviving vertices, terminate and output 
the set of all deleted vertices. 

(c) i=i+l. 



2.2 Algorithm Augment-BFS 

In this section, we give an algorithm to find the feedback vertex set in both undirected and directed 
graphs. Here, we use an oracle that returns cycles according to a breadth-first search ordering. 
Instead of the exact algorithm for the (explicit) hitting set problem, as suggested in the generic 
algorithm, we use a simpler strategy of picking a vertex from each missed cycle. Essentially, the 
algorithm considers cycles according to a breadth-first search ordering and maintains an induced 
tree on a set of vertices denoted as surviving vertices. The vertices deleted in the process will 
constitute a feedback vertex set. Having built an induced tree on surviving vertices up to a certain 
depth i, the algorithm is presented with cycles obtained by a one-step BFS exploration of the 
surviving vertices at depth i. For each such cycle, the algorithm picks a vertex at depth i + 1 to 
delete. The vertices at depth i + 1 that are not deleted are added to the set of surviving vertices, 
thereby leading to an induced tree on surviving vertices up to depth i + 
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Remark 2. Although a very similar algorithm can be used for other variants of the feedback set 
problem, we note that these problems in random graphs turn out to be easy. For example, the 
feedback edge set problem is equivalent to the maximum spanning tree problem, while the feedback 
arc set problem has tight bounds for random graphs using very simple algorithms. 

3 Feedback Vertex Set in Random Graphs 

In this section, we show that Augment-BFS can be used to find a nearly optimal feedback vertex set 
in the undirected random graph G UjP . Our main contribution is a rigorous analysis of the heuristic 
of simple cycle elimination in BFS order. We say that a vertex v is a unique neighbor of a subset 
of vertices L if and only if v is adjacent to exactly one vertex in L. 

In Algorithm Augment-BFS, we obtain induced cycles in BFS order having deleted the vertices 
from the current candidate FVS S. We refine the candidate FVS S precisely as follows to obtain an 
induced BFS tree with unit increase in height: Consider the set c(S) of cycles obtained by one-step 
BFS exploration from the set of vertices at current depth. Let K denote the set of unexplored 
vertices in the cycles in c(S) (K is a subset of the vertices obtained by one-step BFS exploration 
from the set of vertices at current depth) . Among the vertices in K include all non-unique neighbors 
of the set of vertices at current depth into S. Find a large independent set in the subgraph induced 
by the unique neighbors R C K of the set of vertices at current depth. Include all vertices in R 
that are not in the independent set into S. This iterative refinement process is a natural adaptation 
of the idea behind the generic algorithm to the feedback vertex set problem where one collects a 
subset of cycles to find a hitting set H for these cycles and proposes H as the candidate set to 
obtain more cycles that have not been hit. 

ft 




Unique Neighbors Non-Unique 
Neighbors 



Figure 2: BFS Exploration 

Essentially, the algorithm maintains an induced BFS tree by deleting vertices to remove cycles. 
The set of deleted vertices form a FVS. Consequently at each level of the BFS exploration, one 
would prefer to add as many vertices from the next level K as possible maintaining the acyclic 
property. One way to do this is as follows: Delete all the non-unique neighbors of the current 
level from K thus hitting all cycles across the current and next level. There could still be cycles 
using an edge through the unique neighbors. To hit these, add a large independent set from the 
subgraph induced by the unique neighbors and delete the rest. Observe that this induced subgraph 
is a random graph on a smaller number of vertices. However, even for random graphs, it is open 
to find the largest independent set efficiently and only a factor 2 approximation is known. 

In our analysis, instead of using the two approximate algorithm for the independent set problem, 
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we use the simple heuristic of deleting a vertex for each edge that is present in the subgraph to find 
an independent set at each level. In order to lower bound the size of the induced tree, it suffices 
to consider growing the BFS-tree up to a certain height T using this heuristic and then using the 
2-approximate algorithm for independent set at height T to terminate the algorithm. The size of 
the induced tree obtained using Algorithm Augment-BFS is at least as large as the one produced 
by the process just described. To simplify our analysis, it will be useful to restate the algorithm as 
Algorithm Grow-induced-BFS. 



Algorithm Grow-induced-BFS 

1. Start from an arbitrary vertex v at level 0, set Lq = {v } . Mark v as 
exposed. Fix c:=np. 



2. Explore levels i = 0, • • • ,T — 1, where T 
follows : 



ln(l/16p)-ln ln(l/16p) 
In (c+20a/c) 



in BFS order as 



(a) Let ifj+i be the subset of neighbors of L{ among the unexposed vertices, 
where Lj is the set of surviving vertices at level i. 

(b) Mark the vertices in Kj+i as exposed. 

(c) Let Ri+i C Ki + i be the subset of vertices in ifj+i that are unique 
neighbors of Lj . 

(d) For every edge (u,v) that is present between vertices u,v £ Ri+i, add 
either u or v to Wj+i . 

(e) Set Lj+i = Ri+i \ W i+ i . 

(The set of surviving vertices at level i + 1, namely Lj+i is an 
independent set in the subgraph induced by Ri + \.) 

3. On obtaining Lt-\, set Rt = unique neighbors of Lt among the unexposed 
vertices. In the subgraph induced by Rt , find an independent set Lt as 
follows . 

(a) Fix an arbitrary ordering of the vertices of Rt ■ Repeat while i?r^0: 

• Add the next vertex v £ Rt to Lt- Let N(v)= neighbors of v in Rt ■ 

Set Rt Rt \ N(v) . 

4. Return S = V \ uf =0 Li as the feedback vertex set. 



We remark that improving the approximation factor of the largest independent set problem 
in G n , P would also improve the size of the FVS produced. Our analysis shows that most of the 
vertices in the induced BFS tree get added at depth T as an independent set. Moreover, the size 
of this independent set is close to (2/p) lognp(l — o(l)). Consequently, any improvement on the 
approximation factor of the largest independent set problem in G„ )P would also lead to improving 
the size of the independent set found at depth T. This would increase the number of vertices in 
the induced BFS tree and thereby reduce the number of vertices in the feedback vertex set. 

Observe that Algorithm Grow-induced-BFS can be used for the directed random graph D ntP by 
ignoring the orientation of the edges to obtain a nearly optimal feedback vertex set. Such a graph 
obtained by ignoring the orientation of the edges is the random graph G(n, 2p). Further, a FVS in 
such a graph is also a FVS in the directed graph. Consequently, we have the following theorem. 

Theorem 6. For D n ^ p , there exists a polynomial time algorithm that produces a FVS of size at 
most n — (l/2p)(log (np) — o(l)) with probability at least 3/4. 
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By Theorem [3j we see that the algorithm is nearly optimal for directed random graphs. 

Next, we analyze Algorithm Grow-induced-BFS to find the size of the FVS that it returns. For 
i = 0, • • • , T, let L{ be the set of surviving vertices at level % with := |Lj|, Ri+i be the set of 
unique neighbors of Li with rj+i := and Ui be the set of unexposed vertices of the graph 

after i levels of BFS exploration with Ui := \Ui\. Observe that U := V \ (Lq U* =1 Ki). 

We will need the following theorem due to Frieze [Fri90j , about the size of the independent set. 

Theorem 7. ' L Fri90] Let d = np and e > be fixed. Suppose d t < d = o(n) for some sufficiently 
large fixed constant d e . Then, almost surely, the size of the independent set in G n > p is at least 

2\ 

- (log np — log log np — log 2 + 1 — 0.5e) . 
p) 

3.1 Large Set of Unique Neighbors 

The following lemma gives a concentration of the number of surviving vertices, unexposed vertices 
and unique neighbors to survivors at a particular level. It shows that upon exploring t levels 
according to the algorithm, the number of surviving vertices at the t-th level, It, is not too small 
while the number of unexposed vertices, ut, is large. It also shows a lower bound on the number of 
unique neighbors rt+i to a level of survivors. This fact will be used in proving Theorem [T] 

Lemma 8. Let c := np and T be the largest integer that satisfies 16Tp(c + 20y / c) r ~ 1 < 1/2. Then, 
with probability at least 3/4, Vt £ {0, 1, • ■ , T — 1}, 



1. 



Ut < n 



Ig(c-20V-C)')(l +V „ 



In Inn 



i=0 



In Inn 



n 



2. 



i t < {c+2o^y 

k>(c- 20^/cY (1 - 16Tp(c + 20Vc)*) 

x ( 1 T,t=o(c+2oy-cy \ 



n 



3. 



i /In Inn 

n < (c + 20v^)* +1 l + 



n > 



(c-20Vc) t+1 E£o(c + 20 v / S) i 




n 



Now, we are ready to prove Theorem [TJ 
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3.2 Proof of main theorem 

Proof of Theorem^ Our objective is to use the fact that the size of the surviving set of vertices is 
large when the algorithm has explored T — 1 levels. Moreover, the number of unexposed vertices 
is also large. Thus, there is a large independent set among the unique neighbors of the surviving 
vertices. This set along with the surviving vertices up to level T — 1 will form a large induced tree. 
We will now prove that the size of the independent set among the unique neighbors of Lt-\ is 
large. 

By Theorem [7J if r?p > d € for some constant d e and r^p = o(tt), then there exists an inde- 
pendent set of size (2/p) log (ryp)(l — o(l)). It suffices to prove that is large and is such that 
ttP > d € . 

used in the algorithm satisfies the hypothesis 



Note that the choice of T 



ln(l/16p)-lnln(l/16p) 
In (c+20^c) 



of Lemma [8j Therefore, using Lemma [8j with probability at least 3/4, we have 

i 



20y^) T L £L(c + 2(Vc)* 



n 



/ In In n 

x I 1 

n 



> (c-jOyg / EL(c + 20^ 
64p I n 



for sufficiently large c since 



/ /In Inn 

c - 20v/H d £ 
> > — 

ypp p 



f 1 EL(c + 20 v ^r W 1 /lnlnn\ 15 1 
1 n J y \ n J ~ 16 2' 

Consequently, by Theorem [7| there exists an independent set of size at least (2/p) log (r^p)(l — 
o(l)). Moreover, step 3 of the algorithm finds a 2-approximate independent set (see jGM75, 
McD84j). Therefore, the size of the independent set found in step 3 is at least (l/p) logr^p(l— o(l)), 
which is greater than 

-) log (c)(l - o(l)) = (-) log (np)(l - 



VJ \P 

Note that this set gets added to the tree obtained by the algorithm which increases the number 
of vertices in the tree while maintaining the acyclic property of the induced subgraph. Hence, 
with probability at least 3/4, the induced subgraph has YlJ=o k + (l/p)lognp(l — o(l)) vertices. 
Consequently, the FVS obtained has size at most n— (l/p) lognp(l — o(l)) with probability at least 
3/4. 

□ 

4 Planted Feedback Vertex Set Problem 

We prove Theorems [4] and [5] in this section. 
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The proof of Theorems [4] and [5] are based on the following fact formalized in Lemma [9| if 
S C V \ P is a subset of vertices of size at least (1 — £)n/10, then with high probability, every 
vertex u £ P induces a /c-cycle with vertices in S. Consequently, a small hitting set H for the 
/c-cycles should contain either all vertices in P or most vertices from V\P. If some vertex u 6 P is 
not present in , then the size of H will be large since it should contain most vertices from V \ P. 
This contradicts the fact that H is a small hitting set. Thus H should contain the planted feedback 



vertex set P. This fact is stated in a general form based on the size of H in Lemma 10 

For Theorem [4j H is the smallest hitting set. By the previous argument H D P, and we are 
done since no additional vertex v £ V \ P will be present in H (in fact, P is a hitting set for all 
cycles since it is a feedback vertex set). We formalize these arguments in this section. 

Lemma 9. Let D n ^ tP be a planted directed random graph where p > C /n l ~ 2 l k for some constants 
C, k, S. Then, with high probability, for every vertex v 6 P, there exists a cycle of size k through v 
in the subgraph induced by S U {v} in D n ^ )P if S is a subset ofV\P of size at least \V \ P\/10 = 
(\-S)n/\0. 

We give a proof of this lemma by the second moment method later. It leads to the following 
important consequence which will be used to prove Theorems [4] and [5j It states that every suf- 
ficiently small hitting set for the /c-cycles in D n ^, p should contain every vertex from the planted 
feedback vertex set. 

Lemma 10. Let H be a hitting set for the k-cycles in D n ^ p where p > C /n l ~ 2 / k for some constants 
C,k,5. If\H\ < tbn where t < 9(1 -5)/105, then H D P. 

Proof. Suppose u G P and u H. Then H should contain at least \V \ P\ — \V \ P\/10 vertices 
from V \P, else by Lemma [9j there exists a /c-cycle involving u and some k — 1 vertices among the 
\V \ P\/W vertices that H does not contain contradicting the fact that H hits all cycles of length 
k. Therefore, \H\ > \V\P\ - \V\ P\/10 = (1 - <5)9n/10 > t5n by the choice of t. Thus, the size of 
H is greater than tbn, a contradiction. □ 

Proof of Theorem ^} We will first show that the smallest hitting set for the fc-cycles in D n ^ iP is of 
size exactly |P| = 5n. By Lemma [9] there exists a /c-cycle through every vertex v S P and some 
{u u ■ ■ ■ , Ufc_i} C S if S C V \ P and \S\ > (1 - 5)n/W. 

Lemma 11. Lf a subset H C V hits all cycles of length k in D n ^s tP , then \H\ > \P\. 



Proof of Lemma\Tl\ If H contains all vertices in P, then we are done. Suppose not. Let u G P and 
u $l H. Then H should contain at least \V \ P\ — \V \ P\/10 vertices from V \ P, else by Lemma 
[9j there exists a /c-cycle involving u and some k — 1 vertices among the \V \ -P|/10 vertices that 
H does not contain. This would contradict the fact that H hits all cycles of length k. Therefore, 
\H\ >\V\P\-\V\ P\/10 = (1 - 5)9n/W > 5n = \P\ since 5 < 9/19. □ 

Therefore, every hitting set for the subset of fc-cycles should be of size at least |P| = 5n. Also, 
we know that P is a hitting set for the A;-cycles since P is a feedback vertex set in D n & p . Thus, 
the optimum hitting set for the fc-cycles is of size exactly \P\. 

Let H be the smallest hitting set for the fe-cycles. Then \H\ = 5n. It is easily verified that 



t = 1 satisfies the conditions of Lemma 10 if 5 < 9/19. Therefore, HDP. Along with the fact 



that H = 5n = \P\, we conclude that H = P. □ 



4.1 Algorithm to Recover Planted Feedback Vertex Set 

In this section, we give an algorithm to recover the planted feedback vertex set in D n> s, p thereby 
proving Theorem [5j Theorem [4] suggests an algorithm where one would obtain all cycles of length 
k and find the best hitting set for these set of cycles. Even though the number of /c-cycles is 
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polynomial, we do not have a procedure to find the best hitting set for /c-cycles. However, by 
repeatedly taking all vertices of a cycle into the hitting set and removing them from the graph, we 
do have a simple greedy strategy that finds a A;- approximate hitting set. We will use this strategy 
to give an algorithm that recovers the planted feedback vertex set. 
Algorithm Recover-Planted-FVS (£> n ,5,P = D(V,E)) 

1. Obtain cycles in increasing order of size until all cycles of length k are obtained. Let T' be 
the subset of cycles. Let S be the empty set. 

2. While there exists a cycle T £ T' such that S does not hit T, 

(a) Add all vertices in T to S. 

3. Return H, where H = {u £ S : 3 A:-cycle through v in the subgraph induced by V \ SU {u}}. 
The idea behind the algorithm is the following: The set S obtained at the end of step 2 in the 



above algorithm is a /c-approximate hitting set and hence is of size at most kdn. Using Lemma 10 
it is clear that S contains P - indeed, if S does not contain all vertices in P, then S should contain 
most of the vertices in V \ P contradicting the fact that the size of S is at most kSn. Further, 
owing to the choice of 5, it can be shown that S does not contain at least \ V \ P |/10 vertices from 
V \ P. Therefore, by Lemma [9j every vertex v £ P induces a A;-cycle with some subset of vertices 
from V \S. Also, since V \ P is a DAG no vertex v £ V \ P induces cycles with any subset of 
vertices from V \ S C V \ P. Consequently, a vertex v induces a /c-cycle with vertices in V \ S if 
and only if v £ P. Thus, the vertices in P are identified exactly. 

Proof of Theorem [5| We use Algorithm Recover-Planted-FVS to recover the planted feedback ver- 
tex set from the given graph D = D n ^ p . Since we are using the greedy strategy to obtain a hitting 
set S for T' , it is clear that S is a /c-approximate hitting set. Therefore | *S' | < k8n. It is easily 



verified that t = k satisfies the conditions of Lemma 10 if 5 < 1/2A;. Thus, all vertices from the 
planted feedback vertex set P are present in the subset S obtained at the end of step 2 in the 
algorithm. 

By the choice of 5 < l/2k, it is true that \S\ < k5n < 9(1 - <5)n/10 = 9\V \ P\/W. Hence, 
\V\S\>\V\P\/10. 

Since S 5 P, the subset of vertices V \ S does not contain any vertices from the planted set. 
Also, the number of vertices in V \ S is at least \V \ P|/10. Consequently, by Lemma |9j each 
vertex v £ P induces at least one /c-cycle with vertices in V \ S. Since V \ P is a DAG, none of 
the vertices u G V \ P induce cycles with vertices in V \ S. Therefore, a vertex v £ S induces 
a fc-cycle with vertices in V \ S if and only if v £ P. Hence, the subset H output by Algorithm 
Recover-Planted-FVS is exactly the planted feedback vertex set P. 

Next we prove that the algorithm runs in polynomial time in expectation. The following lemma 
shows an upper bound on the expected number of cycles of length k. It is proved later by a simple 
counting argument. 

Lemma 12. The expected number of cycles of length k in D n ^„ is at most (nkp) k . 



12 



the 



Since the expected number of cycles obtained by the algorithm is (nkp) k by Lemma 
algorithm uses (nkp) k -sized storage memory. Finally, since the size of T' is (nkp) k , steps 2 and 3 
of the algorithm can be implemented to run in expected (nkp)°^ time. □ 



5 Proofs 

5.1 Lower Bound for FVS in Random Graphs 

In this section, we prove the lower bound for the Feedback Vertex Set in random graphs. We 
consider the dual problem - namely the maximum induced acyclic subgraph. 
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We will need the following bound on the number of ways to partition a positive integer n into 
k positive integers. 

Theorem 13. jdAPOTf Let Pk{n) denote the number of ways to partition n into exactly k parts. 
Then there exists an absolute constant A < 1 such that 

e C^/^k _ 2V ^ c(fc+l/2) 

Pk (n)<A (n _ k)3/4 e c L 2 (e ^) 

oo 



where c = n^2/3 and L 2 (x) = Y^m=i f or \ x \ — 1 - 

Remark 3. Since we will not need such a tight bound, we will use Pk(n) < C\e C2 ( n ~ k ^ for some 
constants C\,C2 > 0. 

We prove Theorem [2] now based on simple counting arguments. We observe that the proof of 
Theorem [3] given by Spencer and Subramanian is also based on similar counting arguments while 
observing that if a directed graph is acyclic, then there exists an ordering of the vertices such that 
each arc is in the forward direction. 

Proof of Theorem ^ First note that every induced subgraph on r vertices is a graph from the fam- 
ily G(r,p). We bound the probability that a graph H = G(r,p) is a forest. 



Pr (H is a forest) < No. of forests with spanning 

k=l niH \-n k =r,rii>0 

trees on m, • • ■ , vertices 

x Pr (Forest with k components) 

/ k \ 



E E (ralin»r- 



fc=lrtlH \-n k =r,rii>0 \lli=X "">■■ / \i=l 

r 

x p 



fc=lniH hn fc =r,rii>0 ^ 

<rI(l-p)G)£ £ (2 P r fc 

fc=l rtiH \-n k =r,n,i>0 

(since p < 1/2) 

<r\(i- P pj2(2pr k E 1 

fc=l niH hnfe=r,ni>0 

r 

= r!(l-p)(2)^(2 P r- fe p fc (r) 

k=l 

r 

< r!(l - p)(S) ^ {2pf- k de c ^ r - k ^ 

k=l 

(by Remark 3) 

r 

< Cir r (l-p)©^(2e%) r - fe 

fe=i 



1 — k 
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< Ci(l -p) r -n r J2( 2 e° 2 P) 

k=l 

(since r < n) 

< C 1 (l-p)^r(2e C2 np) r 

< e -r-(f-log(2e c 2 np )_l2£^ir)) 



r—k 



which tends to zero when r > -(lognp)(l + o(l)). □ 
5.2 Feedback Vertex Set in Random Graphs 

We will use the following Chernoff bound for the concentration of the binomial distribution. 

Lemma 14. Let X = Y^i=l w here X% are i.i.d. Bernoulli random variables with Pr {Xi = 1) = p. 
Then 

Pr (\X - np\ > a^/np) < 2e~ a2/2 . 

Proof of Lemma^ We prove the lemma by induction on t. We will prove the stronger induction 
hypothesis that every li, Ui for i 6 {0, 1, • • • ,t} satisfy their respective concentration bounds with 
probability at least 

a t :=l- — --Yl/i\ 
16T 16^ 1 

8=1 

We will prove the concentration of rj + i as a consequence of li and Ui satisfying their respec- 
tive concentration bounds. We will in fact show that the failure probability of rj+i satisfying its 
concentration bound conditioned on li and Ui satisfying their respective concentration bounds 
will be at most l/(32(i + l) 2 ). It immediately follows that with failure probability at most 
(t/16T) + (3/32)^* =1 (l/i 2 ) + (l/32(t + l) 2 ) < 1/4, every r i+1 , u % and l u for i € {0, 1, • • • ,t} 
satisfies its respective concentration bound leading to the conclusion of the lemma. 

For the base case, consider t = 0. It is clear that uq = n — 1 and Iq = 1 satisfy the concentration 
bounds with probability 1. For the induction step, the induction hypothesis is the following: With 
probability at least at, the concentration bounds are satisfied for Ui and li for every i £ {0, 1, • • • , t}. 
We will bound the probability that ut+i or fails to satisfy its corresponding concentration bound 
conditioned on the event that Uj, /j for i G {0, 1, • • • , t} satisfy their respective concentration bounds. 
1. To prove the concentration bound for ut+i, observe that ut+\ is a binomial distribution with ut 
trials and success probability (1 —py*. Indeed, ut+\ is the number of vertices among Ut which are 
not neighbors of vertices in Lj. For each vertex x G Ut-, Pr (x has no neighbor in Lt) = (1 — p) lt . 

we have that Pr (\u t+ i - u t {l - p) k \ > 7t+i\A i <(l - p) lt ^j 



Therefore, by Lemma 
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< 2e^V 2 



32(t + l) 2 

with 7 i+ i = v / 4ln8(tTT). Hence, with probability at least 1 - (l/32(i + l) 2 ), 



n sh d , /4ln8(t + l)\ 



u t +i > u t {l-p) h 1 



'41n8(t + l) 
u t {l-p) k 
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Now, using the bounds on ut and It, 

«t(l ~p) lt 



41n8(i + 1) < lOlnlnn 



since t + 1 < T < In n, 



Hence, 



Therefore, 



i=0 



15n 
"16"' 

15 



(1 -p(c + 20Vc)*) > ~— and 
lb 



In In n \ 1 
n ) - 2' 



Hi+l^tfl-J)) 1 ' 1 + 



In Inn 



In Inn 



n 



u t +i > u t (l - vT i 



In Inn 

n 

(Using inequality [2]) 

In Inn 

n 



> u t (l - l t p) yi - 

t(c + 2Q^)(l- C(C + 2 °^H 



> n 



x 1 



In In n 



(Using the bounds on ut and It) 
>(n-£( C + Wl-<i±^) 

j=0 x ' 



X 1 



In Inn 

n 



t 



n 



+ 



5^(c + 20 v ^) i - (c + 20^) 
(c + 20Vc)' +1 

i=0 



t+i 



ii 



J2(c + 20V~c) 1 



x 1 



In Inn 

n 



15 



In Inn 



n 



which proves the lower bound. The upper bound is obtained by proceeding similarly: 

, , / In In n 

«W<«((l-p)" 1 + 



n 



(Using inequality [I]) 



lfP\ I /In Inn 

< n l - l + 



2 J \ V n 



c(c - 20^c) 



< u t ( 1 - — — "" vw (1 - 162>(c + 20>/c)*) 



n 



1 EU(^ + 20y^)M \ / /Inlnn 



n / / \ V n 

(Using the bound on It) 



^ c(c-20V5n/ 1+ ./lnlnn 



4n / \ \ n 



Since (1 - 16Tp(c + 20-v/c)*) > 



/ 1 _EU(£±^)! 



15 

> 



< n 



n I 16 I 

E-=o(c-20Vc) l \ / c(c- 20^)* 



4n /V 4n 



/ In In n 

x 1 + 



n 

t 



< n 



E- = o(^-2Q^)M L _ (c - 20^)^+ 1 
4n y \ 4n 



/ In In n 

x 1 + 



ii 



< / ES(c-20^)- / /lnlnn 



4n / \ \ n I 

Thus, nt + i satisfies the concentration bound with failure probability at most l/(32(t + l) 2 ) 
conditioned on the event that Ui, U for i £ {0, 1, • • • , t} satisfy their respective concentration bounds. 
2. Next we address the failure probability of r t +\ not satisfying its concentration bound conditioned 
on the event that Ui,h for i G {0, 1, • • • ,t} satisfy their respective concentration bounds. Lemma 



15 proves that the number of unique neighbors rt+\ is concentrated around its expectation. 
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Lemma 15. Let q t := pk(l — p) • With probability at least 1 — (l/32(i + 1) 



20 \ / 20 
q t u t \1 + —t=\ > r t+ i > q t u t II y= 



when t + 1 < T. 



Proof of Lemma 15, Observe that r^ + i is a binomially distributed random variable with ut trials 
and success probability q t . Indeed, r t +i is the number of vertices among Ut which are adjacent 
to exactly one vertex in Lf. For each u E Ut, Pr(u is adjacent to exactly one vertex in Lt) = 
pkil-p) 1 *- 1 =q t . 



Using Pt+i = y / 41n8(t + 1), by Lemma 14, we have that Pr (\rt+i — qtUt\ > Pt+i\JqtUt) 

< 2e _/3 *+ l/2 ' 



32(t + l) 2 ' 



Hence, with probability at least 1 — (1/32 (t + l) 2 ), 



n+i < qm 1 + 



n+i >> qtu t i 



/41n8(t + l)\ 
/41n8(t + l) 

V qtu t 



(3) 
(4) 



Lemma 16 proves the concentration of the expected number of unique neighbors of Lt condi- 
tioned on the event that Ui,U for i G {0, l,--- , t} satisfy their respective concentration bounds. 
This in turn helps in proving that rt+i is concentrated. 

Lemma 16. For t + 1 < T, if ut and It satisfy their respective concentration bounds, then 
1. qtu t <c(c + 20^y [1+ ^™ 



2. q t u t 



> c(c-20yjy ^ _ S^(c+20V5)^ f x 



In Inn 
n 
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Proof of Lemma 16. Recall that qt = phO- — p) ■ Hence 



t 



q t u t > p{n - J^(c + 20^)%(1 - p) 1 ^ 1 



using Lemma 18 and 



For the upper bound: 



i=0 



/ In In n 

x I 1 , 

n 



pn 1- ^= uv ) k{l-p) 



In In n 

n 



! In In n 

x I 1 

n 



x (1 - l6Tp(c + 20^)*)(1 - p(c + 20-v/c)*) 



(/In Inn \ 
'-v— J 

(By the bound on It) 
c(c- 20^)' A E£o(c + 20v^ 



> ^ 1 



4 \ n 



/ In In n 

n 



(1-16^(0+20^*) > i 



(1 -p(c + 20Vc)') > J whent + l<T. 



q t u t = pl t (l -p) h 1 u t 
< phut 



4 / \ V n 

(Using the bound on lit) 



< c l i-MiM i /lnlnre 



. 1 -+ 1/ 

4n / \ V u 
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x 1 + 



ln In n 



n 



(Using the bound on It 



< c(c + 20^)* 1 + 



In Inn 



□ 



Consequently, using Lemma 16 



4 In 8ft + 1) 400 

<2W ~~ c 



since, when t + 1 < T, 



i E£o(c + 20y^)M /15V 
n i ~ V 16 / 



In In n \ 1 

> - and 

n J ~ 2 

1 > 41n8(i + l) 



2 - (c- 20^c) r 

Hence, by inequalities [3] and [IJ with probability at least 1 — (1/32 (t + 1) 



20 \ / 20 
q t u t [l + -y=j > r t+ i > q t u t II y= 



when t + 1 < T. This concludes the proof of Lemma 15 



(5) 



□ 



Lemmas 15 and [16] together show that rt+i satisfies the concentration bounds with failure 
probability at most (l/32(rj + l) 2 ) conditioned on the event that ut and It satisfy their respective 
concentration bounds. 

3. Finally we address the failure probability of lt+i satisfying its concentration bound conditioned 
on the event that m, I4 for i G {0, 1, • • • , t} satisfy their respective concentration bounds. By Step 
2(e) of the algorithm, the number of surviving vertices in level t + 1 is lt+i := rt+i — mt+i, where 



TOi-|_i denotes the number of edges among the vertices in Rt+i- In Lemma 15, we showed that 



the number of unique neighbors rt+i is concentrated around its expectation. Lemma 17 proves a 
concentration which bounds the number of edges among the vertices in Rt. These two bounds will 
immediately lead to the induction step on lt+i- Thus, the probability that k+i does not satisfy its 
concentration bound will at most be the probability that either mt+i or r^+i does not satisfy its 
respective concentration bound. 

Lemma 17. m^+i < 8Tr| +1 p with probability at least 1 — (1/16T). 
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Proof of Lemma 11. Recall that mt+i denotes the number of edges among the vertices in Rt+i- 
Since the algorithm has not explored the edges among the vertices in Rt+i, rrit+i is a random 
variable following the Binomial distribution with ( r *2 1 ) trials and success probability p. By Markov's 
inequality, we have that for t + 1 < T, 



Pr (m m > 8Tr 2 lP ) < 



1 



16T 



Hence, mt+i < 8Tr^ +1 p with probability at least 1 — (1/16T), . □ 
Recollect that lt + \ = rt+i — mt+i- The upper bound of the induction step follows using Lemma 



k+i < n+i 



< q t u t 1 + 



20 



< c(c + 20v^)* 1 + 



< (c + 20Vc) m I 1 + 



In Inn 



n 



In In n 



n 



1 + 



20 



For the lower bound, we use Lemmas 15 and 17 conditioned on the event that It and ut satisfy 



their respective concentration bounds. With failure probability at most 



we have that lf + % 



32(t + l) 2 + 16T 



= r t+1 - m t+ i 
> rt+i - 8Tr 2 +1 p 
= r i+ i(l - 8Tr t+ ip) 
20 
~7c 



> qtu t l 



(Using Lemma 15) 
l tP {l - pf^ut (l - 8Tl tP 2 {l - p) h - 1 



Ut 



x 1 + 



(Substituting for q t = pl t (l — p) 



>kp[l 



20 



(l-ltp)(l-12Tl t p l (l-p) lt - 1 ut) 



20 



> l tP (l - ^\ (1 - l t p){l - YlTnp\{\ - pf- 1 ) 

(Since ut < n) 

> l tP (l - ^\ (1 - l t p){l - 12Tcpl t (l - pf- 1 ) 

> hp (l - ^=)0-- kp-l2Tcpl t {l-p) h {l - hp)) 

> l tP (l - ^\ (1 - l t p(l + 12Tc)) 

/ 20 

> l t pu t {\ - l t p(l + 12Tc)) I 1 - -j= 

t 

> l t p{n - ^2 (c + 20^)0(1 " «tP(l + 12Tc)) 

i=0 

x ( 1 — J (Using the bound on ut) 



> Z 4 np 



1 _EU(c + 20VB)- (1 _ W1 + I2re)) 



n 

20' 



n 1 ^ 

/, 20 

>e(c-20^)'(l-£k(£±^>:y 

x (1 - 16Tp(c + 20Vc)') 

(1 - (c + 20^/c)*p(l + 12Tc)) (\ - 
(using the bound on l t ) 



X 



x (1 - 167>(c + 20Vc)' +1 ) (Using Lemma[l8]) 

proving the induction step of the lower bound for lt+i- 

Thus, Zf+i satisfies the concentration bounds with failure probability at most (l/32(t + l) 2 ) + 
(1/16T) conditioned on the event that Ui, li for i E {0, 1, • • • ,t} satisfy their respective concentration 
bounds. 

Finally, by the union bound, with probability at most S2 (t+i) 2 + 32(t+i) 2 + 16T> either Ut+i or 
lt+i does not satisfy its respective concentration bounds conditioned on the event that Ui, li for 
i £ {0, 1, • • • ,t} satisfy their respective concentration bounds. By induction hypothesis, the failure 
probability of some Ui,li for i £ {0, 1, • • • ,t} not satisfying their respective concentration bound 
is at most 1 — at- Hence, the probability that Ui, k satisfy their respective concentration bound 
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for every i G {0, 1, • • • , t + 1} is at least a t (l - (l/16(t + l) 2 ) - (1/16T)) > a t +i. Therefore, with 
probability at least at+i, every m, U for i G {0, 1, • • • , t + 1} satisfy their respective concentration 
bounds. This proves the stronger induction hypothesis. 

To complete the proof of Lemma [8j recollect that we showed that the failure probability of ri+i 
satisfying its concentration bound conditioned on Zj and m satisfying their respective concentration 
bounds is at most l/(32(i + l) 2 ). By the union bound argument, it immediately follows that with 
failure probability at most {t/lQT) + (3/32) E^l^A 2 ) + (l/32(t + l) 2 ) < 1/4, every r i+x , m and 
li, for i € {0, 1, • • • , t} satisfies its respective concentration bound. This concludes the proof of 
Lemma [H □ 

Lemma 18. For t + 1 < T, 

1. 



n n 



2. (l - 16Tp(c + 20v/c)*) (1 - (c + 20Vc)*p(l + 12Tc)) 

> (1 - 16Tp(c + 20Vc)* +1 ) 



Proof of Lemma 18. We prove the first part of the Lemma by induction. For the base case, we 
need to prove that 

J+ 1 _2 >1 _c+20^_l 
n 2 n n n 

i.e., to prove that n — 1 < (c + 20\/c)n 

which is true. For the induction step, we need to prove that 

L E- = o(^ + 20y^r (c + 20 > /c) t + i y 
\ n n I 



Now, LHS 



>! E£o(c + 20^ 
n 



\ EUc + ZO^ V , (c + 20^) 2 *+ 2 
n I n 2 



2(c + 20ygW / ELo(c + 20^ 



jt+1 

n \ n 



>1 E^o(c + 20^ (c + 20^p+ 2 

2(c + 20Vc) t+1 2(c + 20^) f + 1 E- = o( c + 20 ^) i 

2 
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Hence, it is sufficient to prove that 



(c + 20^) t+2 < (c + 20^) 2t + 2 _ 2(c + 20^) t+1 
n ~ n 2 n 



2(c + 20^) f+1 ^* =Q (c + 20^)* 
rt 2 



~ (c + 20 1 /c)* +1 

(c + 20^) > 2 - 1- v 



n 



n 



which is true for large enough c when t + 1 < T. 

For the second part of the Lemma, we need to prove that (l — lQTp(c + 20-^/c)') (1 — (c + 
20^)^(1 + 12Tc)) 

> (1 - 167>(c + 20 v / c) m ) 
i.e., 1 - 16Tp(c + 20Vc)* - (c+ 20-v/c)'p(l + 12Tc) + 18Tp 2 (c + 20^) 2 *(1 + 12Tc) 

> 1 - 167>(c + 20 v / c) m 

i.e., 

(1 - 16Tp(c + 20 v / c)*)(l + 12Tc) < 16T(c + 20^ - 1) 

which is true since 1 + 12Tc < 16T(c + 20-^/c — 1) for large c and the rest of the terms are less than 
1 when t + 1 < T. 

□ 

5.3 Planted Feedback Vertex Set 

We prove Lemma [9] by the second moment method. 

Proof of Lemma^ Let S C V \ P, \S\ > (1 — <5)ra/10, v G P. Let X v denote the number of cycles 
of size k through v in the subgraph induced by 5 U {v}. Then, E(X„) = (^ 1 ^] n / 1 °)p k ■ Using 
Chebyshev's inequality, we can derive that 

Pr(X„ = 0)< Var(X «> 



E(Jf„)2 ' 

To compute the variance of X v , we write X v = YlAcs-\A\=k-l -^A, where the random variable X A 
is 1 when the vertices in A induce a cycle of length k with v and otherwise. 

VarpQ < E(X V ) 

+ Y, Cov(X A ,X B ) 

A,BCS:\A\=\B\=k-l,A^B 

Now, for any fixed subsets A,BQS, \ A\ = \B\ = k - 1 and \AnB\ = r, Cov (X A , X B ) < p 2k -' r 
and the number of such subsets is at most ( 2 fcl^_ r ) C) — (2k-2-r) (r) - Therefore, 

E v Cov (X a ,Xb) 

r=0 A,BCS:\A\=\B\=k-l,\AnB\=r y 1 
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(for some constants C r dependent on r, S) 

-> 

as n — > oo if p > C /n l ~ 2 / k for some sufficiently large constant C since each term in the summation 
tends to and the summation is over a finite number of terms. Thus 

Pr{X v = Q) < (1 a) /10 < ((i_,5) n /io)*-ipfc- 

Therefore, 

Pr (X„ > 1) > 1 ' 



((1 - 5)n/10) k ~ 1 p k 
and hence 

1^1 



Pr (X v > 1W € P) > 1 



((1 - <5)n/10) fc - V 



2 \ 5n 

1_ ((l-5)n/10) fe -V 



> g 2(l-i)'=- 1 n fe - 2 p fe _^ ]_ 

as n — )• oo if p > if^/a, for some large constant C. □ 
Finally, we prove Lemma [i~2| by computing the expectation. 



Proof of Lemma 12, E (Number of cycles of length k) 



£ffi( (1 r_?V 



, i J V fc — i 

i=l x 7 v 



<^(<5n) l ((l-5)n) fc - l (M" 



i=i 



((l-^nM'Efr^) 4 

((l-<5)nfcp) fc (l-5) < (nfcp)*. 



□ 
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6 Conclusion 



Several well-known combinatorial problems can be reformulated as hitting set problems with an 
exponential number of subsets to hit. However, there exist efficient procedures to verify whether a 
candidate set is a hitting set and if not, output a subset that is not hit. We introduced the implicit 
hitting set as a framework to encompass such problems. The motivation behind introducing this 
framework is in obtaining efficient algorithms where efficiency is determined by the running time as 
a function of the size of the ground set. We initiated the study towards developing such algorithms 
by showing an algorithm for a combinatorial problem that falls in this framework - the feedback 
vertex set problem on random graphs. It would be interesting to extend our results to other implicit 
hitting set problems mentioned in Section 1.1. 
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